129 research outputs found

    Association Analysis of Somatic Copy Number Alteration Burden With Breast Cancer Survival

    Get PDF
    The increasing prevalence of diagnosed breast cancer cases emphasizes the urgent demand for developing new prognostic breast cancer biomarkers. Copy number alteration (CNA) burden measured as the percentage of the genome affected by CNAs has emerged as a potential candidate to this aim. Using somatic CNA data obtained from METABRIC (Molecular Taxonomy of Breast Cancer International Consortium), we implemented Kaplan-Meier estimators and Cox proportional hazards models to examine the association of CNA burden with patientā€™s overall survival (OS) and disease specific survival (DSS). We also evaluated the association by considering patientsā€™ age and tumor subtypes using stratified Cox models. We delineated the distribution of CNA burden in sample genomes and highlighted chromosomes 1, 8, and 16 as the carriers of the highest CNA burden. We identified a strong association between CNA burden and age as well as CNA burden and breast cancer PAM50 subtypes. We found that controlling the effects of both age (bound by 45-year) and PAM50 subtypes on patient survival using stratified Cox models, would still result in significant association between CNA burden and patients overall survival in both Discovery and Validation data. The same trend was observed in disease specific survival when only PAM50 subtypes were controlled in the stratified Cox models. Our analysis showed that there is a significant association between CNA burden and breast cancer survival. This result is also validated by using TCGA (The Cancer Genome Atlas) data. CNA burden of breast cancer patients has a considerable potential to be used as a novel prognostic biomarker

    Using the ratio of means as the effect size measure in combining results of microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small.</p> <p>Results</p> <p>We propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the quality-unweighted random effects models and the quality-weighted random effects models. We defined the quality measure as a function of the detection p-value, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes.</p> <p>Conclusion</p> <p>Our results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both quality-weighted and quality-unweighted data integration frameworks. The new effect size measure and the quality-weighted microarray data integration framework provide efficient ways to combine microarray results.</p

    Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Get PDF
    Background: Microarray technology has been previously used to identify genes that are differentially expressed between tumour and normal samples in a single study, as well as in syntheses involving multiple studies. When integrating results from several Affymetrix microarray datasets, previous studies summarized probeset-level data, which may potentially lead to a loss of information available at the probe-level. In this paper, we present an approach for integrating results across studies while taking probe-level data into account. Additionally, we follow a new direction in the analysis of microarray expression data, namely to focus on the variation of expression phenotypes in predefined gene sets, such as pathways. This targeted approach can be helpful for revealing information that is not easily visible from the changes in the individual genes. Results: We used a recently developed method to integrate Affymetrix expression data across studies. The idea is based on a probe-level based test statistic developed for testing for differentially expressed genes in individual studies. We incorporated this test statistic into a classic random-effects model for integrating data across studies. Subsequently, we used a gene set enrichment test to evaluate the significance of enriched biological pathways in the differentially expressed genes identified from the integrative analysis. We compared statistical and biological significance of the prognostic gene expression signatures and pathways identified in the probe-level model (PLM) with those in the probeset-level model (PSLM). Our integrative analysis of Affymetrix microarray data from 110 prostate cancer samples obtained from three studies reveals thousands of genes significantly correlated with tumour cell differentiation. The bioinformatics analysis, mapping these genes to the publicly available KEGG database, reveals evidence that tumour cell differentiation is significantly associated with many biological pathways. In particular, we observed that by integrating information from the insulin signalling pathway into our prediction model, we achieved better prediction of prostate cancer. Conclusions: Our data integration methodology provides an efficient way to identify biologically sound and statistically significant pathways from gene expression data. The significant gene expression phenotypes identified in our study have the potential to characterize complex genetic alterations in prostate cancer

    Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

    Get PDF
    Computational analysis of high-throughput omics data, such as gene expressions, copy number alterations and DNA methylation (DNAm), has become popular in disease studies in recent decades because such analyses can be very helpful to predict whether a patient has certain disease or its subtypes. However, due to the high-dimensional nature of the data sets with hundreds of thousands of variables and very small number of samples, traditional machine learning approaches, such as support vector machines (SVMs) and random forests, have limitations to analyze these data efficiently. In this chapter, we reviewed the progress in applying deep learning algorithms to solve some biological questions. The focus is on potential software tools and public data sources for the tasks. Particularly, we show some case studies using deep neural network (DNN) models for classifying molecular subtypes of breast cancer and DNN-based regression models to account for interindividual variation in triglyceride concentrations measured at different visits of peripheral blood samples using DNAm profiles. We show that integration of multi-omics profiles into DNN-based learning methods could improve the prediction of the molecular subtypes of breast cancer. We also demonstrate the superiority of our proposed DNN models over the SVM model for predicting triglyceride concentrations

    Modeling Gene-Environment Interaction for the Risk of Non-hodgkin Lymphoma

    Get PDF
    Background: Non-hodgkin lymphoma (NHL) is one of the most common and deadly cancers. There is limited analysis of gene-environment interactions for the risk of NHL. This study intends to explore the interactions between genetic variants and environmental factors, and how they contribute to NHL risk.Methods: A case-control study was performed in Shanghai, China. The cases were diagnosed between 2003 and 2008 with patients aged 18 years or older. Samples and SNPs which did not satisfy quality control were excluded from the analysis. Weighted and unweighted genetic risk scores (GRS) and environmental risk scores were generated using clustering analysis algorithm. Univariate and multivariable logistic regression analyses were conducted. Moreover, genetics and environment interactions (G Ɨ E) were tested on the NHL cases and controls.Results: After quality control, there are 22 SNPs, 11 environmental variables and 5 demographical variables to be explored. For logistic regression analyses, 5 SNPs (rs1800893, rs4251961, rs1800630, rs13306698, rs1799931) and environmental tobacco smoking showed statistically significant associations with the risk of NHL. Odds ratio (OR) and 95% confidence interval (CI) was 10.82 (4.34ā€“28.88) for rs13306698, 2.84 (1.66ā€“4.95) for rs1800893, and 2.54 (1.43ā€“4.58) for rs4251961. For G Ɨ E analysis, the interaction between smoking and dichotomized weighted GRS showed statistically significant association with NHL (OR = 0.23, 95% CI = [0.09, 0.61]).Conclusions: Several genetic and environmental risk factors and their interactions associated with the risk of NHL have been identified. Replication in other cohorts is needed to validate the results

    Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis

    Get PDF
    In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects

    Identifying cis- and trans-acting single-nucleotide polymorphisms controlling lymphocyte gene expression in humans

    Get PDF
    Assuming multiple loci play a role in regulating the expression level of a single phenotype, we propose a new approach to identify cis- and trans-acting loci that regulate gene expression. Using the Problem 1 data set made available for Genetic Analysis Workshop 15 (GAW15), we identified many expression phenotypes that have significant evidence of association and linkage to one or more chromosomal regions. In particular, six of ten phenotypes that we found to be regulated by cis- and trans-acting loci were also mapped by a previous analysis of these data in which a total of 27 phenotypes were identified with expression levels regulated by cis-acting determinants. However, in general, the p-values associated with these regulators identified in our study were larger than in their studies, since we had also identified other factors regulating expression. In fact, we found that most of the gene expression phenotypes are influenced by at least one trans-acting locus. Our study also shows that much of the observable heritability in the phenotypes could be explained by simple single-nucleotide polymorphism associations; residual heritability was reduced and the remaining heritability may represent complex regulation systems with interactions or noise
    • ā€¦
    corecore